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C3 ' ABSTRACT 

To automate source detection, two-dimensional light-profile Sersic modelling and catalogue 
compilation in large survey applications, we introduce a new code GALAPAGOS, Galaxy 
Analysis over Large Areas: Parameter Assessment by GALFITting Objects from SExtractor. 
Based on a single setup, Galapagos can process a complete set of survey images. It detects 
sources in the data, estimates a local sky background, cuts postage stamp images for all 
sources, prepares object masks, performs Sersic fitting including neighbours and compiles 
all objects in a final output catalogue. For the initial source detection Galapagos applies 
SExtractor, while Galfit is incorporated for modelling Sersic profiles. It measures the 
background sky involved in the Sersic fitting by means of a fiux growth curve. Galapagos 
determines postage stamp sizes based on SExtractor shape parameters. In order to obtain 
precise model parameters Galapagos incorporates a complex sorting mechanism and makes 
use of modern CPU's multiplexing capabilities. It combines SExtractor and Galfit data 
in a single output table. When incorporating information from overlapping tiles, GALAPAGOS 
automatically removes multiple entries from identical sources. Galapagos is programmed 
in the Interactive Data Language, IDL. We test the stability and the ability to properly 
recover structural parameters extensively with artificial image simulations. Moreover, we 
apply Galapagos successfully to the STAGES data set. For one-orbit HST data, a single 
2.2 GHz CPU processes about 1000 primary sources per 24 hours. Note that Galapagos 
results depend critically on the user-defined parameter setup. This paper provides useful 
guidelines to help the user make sensible choices. 
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^1 INTRODUCTION 

Imaging surveys provide a general tool to access the average 
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j^ Ijroperties of galaxy populations. A survey data set usually 
consists of an arrangement of primary images in one or sev- 
eral filters. These data are often accompanied by various sup- 
plementary data . Examples for suc h surveys are C OMBO-17 



Wolf et al.ll2004l} DEE P1/DEEP2 JVogt et al.ll2q05l). G OODS 



Giavalisco at 311120041 ). C OSMOS JScoville et al.ll2007l ) or the 



Hubble Ultra Deep Field (|Beckwith et al.ll2006l '). 

Common to all imaging surveys are the specific reduction 
methods involved in the data analysis. After reducing the imag- 
ing data, which normally consists of a mosaic of many poten- 
tially (partly) overlapping tiles, scientific sources are detected 
and compiled in a source catalogue. Depending on the scientific 
goals, more sophisticated methods are then applied to analyse 
the morphology of the sources, i.e. quantify the structure of 
their light-profiles. Finally, the resulting additional structural 
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parameters are added to the source catalogue. Somewhere in 
this process the source catalogue might (optionally) get cleaned 
from duplicate source entries or other artifacts. 

For the main task, source detection a n d ext raction, the 
code SExtractor by iBertin fc ArnoutsI (|T99a) has been 
widely-used in astronomy. Based on a simple setup script SEx- 
tractor detects sources, estimates a background sky level, 
measures primary shape information, like position, position an- 
gle and axis ratio, and even performs aperture photometry. A 
key feature is the ability to properly deblend close compan- 
ion sources, while at the same time avoid breaking single large 
sources up into several pieces. Other features include a neural 
network to separate stars and galaxies or the option to associate 
the detected objects with a given list of input positions. SEx- 
tractor is designed with minimum user interaction, support 
for large images and high execution speeds in mind. 

In order to analyse galaxy light profiles quantitatively, 
many codes have been developed. The ones that are most widely 
used employ a two-dimensional fitting method to model el- 



lipsoidal radial profiles, and include convolution with a point 
spread function (PSF). 

One of these codes is Gim2d, which was first employed 
as part of an Iraf pipeline to analyse survey imaging data 
(jSimard et al.ll2002l \ Based on a Metropolis algorithm to find 
the minimum in x-space, Gim2d mainly uses the Sersic pro- 
file (jSersic 1968), which is a general expression that includes 
both the de Vaucouleurs and exponential forms (see Sec. 13.51 
and eq. [1]). The minimisation method performs a global pa- 
rameter space search. As a result, Gim2d is robust, however it 
requires large amounts of CPU time compared to other codes 
(e.g. Haussler et al. 2007). 

Another application for modellin g light profiles is Budda 
ijde Souza. Gadotti fc dos Aniosll2004 '). Budda was initially de- 
veloped to perform bulge/disc decomposition. However, it has 
recently been updated to include also bar and central point 
source modelling. Moreover, it now also features a double ex- 
ponential profile for discs. 

Final ly, a rather vers atile and effective method was pre- 
sented bv lPeng et al.l (|2002 ,. 2010): Galfit. Like the aforemen- 
tioned programmes, it is a two-dimensional fitting code to ex- 
tract structural components from galaxy images. It is designed 
to model galaxies in as flexible a manner as possible, by allow- 
ing the user to fit any number of components and functional 
forms. Galfit therefore allows for the possibility to not only 
fit simple situations, but also for fitting more complicated se- 
tups including bulge, disk, bar, halo, etc. This freedom has the 
major advantage that not only may the object of prime inter- 
est be fitted, but so may the neighbouring sources - at the 
same time, as some situation may demand. Various light pro- 
file models are bu ilt into the code, inc luding the " Nuker" law 
(JLauer et al.lll995l ). the Sersic profile ([Sersig |l968|) , an expo- 
nential disc, Gaussian or Moffat functions and even a pure PSF 
for modelling stars. Galfit convolves all model profiles, except 
for the PSF itself by the PSF to simulate image smearing by 
Earth's atmosphere and telescope optics. 

Although a scientist has a multitude of options to choose 
from for fitting and detecting objects, analysing a complete sur- 
vey to the end of obtaining a source catalogue with galaxy 
parameters, requires many intermediate steps. For example, 
duplicate sources from tile overlaps have to be differentiated; 
the detection and fitting codes have to be set up; a proper 
local background sky value has to be estimated; resulting 
source parameters have to be compiled in a catalogue. As 
these steps are fairly general we have built a code that sim- 
plifies all these steps and largely automates the entire pro- 
cess. Our code, GALAPAGOS, performs all the required steps 
from a single setup and with minimal manual interaction pro- 
vides a fitting catalogue. It runs SExtractor to detect sources 
and performs an automated Sersic fit using Galfit. Amongst 
the various codes introduced above, we opted to use Galfit 
because it outperfor ms Gim2d both in speed and reliability 
(jHaussler et al.ll2007l ) and allows a much wider range of light- 
profile models than Budda. Upcoming versions will include ad- 
ditional features like automated multi-component fitting. The 
code is available freely for public download from our website 
at: |http : //astro- staff . uibk . ac . at/~ m . ba rden/galapagosT] 

The layout of the paper is as follows. We start by giving 
an overview of the structure of the code (Sec. [2]). Then we elab- 
orate on the methods involved in the individual components 
(Sec. [3]). Next, we present some fitting results based on simu- 
lated data and provide details concerning the reliability of the 
code (Sec. [J). Subsequently, we give estimates on the perfor- 
mance of Galapagos (Sec. [5]), followed by a summary (Sec. [6]). 
Upon first reading this article we suggest to skip Sec. [S] which 
address mainly the frequent GALAPAGOS user. In the course 



of the paper we assume a working knowledge of SExtrac- 
TOR and Galfit and re fer t he reader to the p ublications by 
iBertin fc ArnoutsI (|l996l ') and lPeng et"ai1 (|2002l V 



2 OVERVIEW OF CODE STRUCTURE 

Galapagos is divided into four main blocks, each of which is 
executable independently from the others. This allows fiexibil- 
ity of repeating or optimising certain segments of the analysis 
without re-running the entire pipeline. These blocks are: 

(i) Detect sources by running SExtractor (B) 
(ii) Cut out postage stamps for all detected objects (C) 
(iii) Estimate sky background, prep. & run Galfit (D) 
(iv) Compile catalogue of all galaxies (F) 

Note that letters in brackets correspond to the respective sec- 
tions in the GALAPAGOS setup file (see Sec. IA1|) . We visualise 
this structure in Fig. [l] 

Also note that GALAPAGOS does not create the PSF image, 
which is required by Galfit in the fitting process. The user is 
responsible for providing such an image. A proper PSF should 
have a sufficient S/N, i.e. better than the brightest objects in 
the survey, in order not to degrade the science data. Further- 
more, it should contain all features of the PSF down to the 
noise and it should not be truncated at the edges. Also, it has 
to be background subtracted and normalised to a total flux of 
1. 



2.1 Source Detection 

In the first block (B) , SExtractor is run to detect sources 
on the individual survey images. Optionally, GALAPAGOS fea- 
tures a high dynamic range (HDR) mode for source extraction 
(Sec. 13. ip . which is ideally suited for wide area and/or space- 
based, e.g. HST, data. After a first pass, the user may refine 
this catalogue by identifying "bad" detections followed by re- 
running SExtractor. This may be required to fix overly de- 
blended sources or to remove spurious detections (see Sec. l3.6|l . 
Once all tiles are analysed, GALAPAGOS combines the individual 
output catalogues, rejecting duplicate sources (see Sec. l3.1|l and 
optionally bad detections like cosmic rays etc. (see Sec. I3.6[) . 



2.2 Postage Stamp Cutting 

To reduce the amount of time needed to ingest an image into 
Galfit it is worthwhile to first extract each galaxy from the 
survey mosaic. Therefore, in the second block (C) , GALAPA- 
GOS estimates a size for each object based on its SExtractor 
parameters. With this information it computes the extent of a 
postage stamp. From the original survey images, GALAPAGOS 
then creates such a cutout for every object. It performs the 
subsequent fitting with Galfit on these postage stamps (see 
Sec. I3.3|l . At this stage, GALAPAGOS creates for every survey 
image a "sky-map" containing information about the nature of 
the pixel flux (either "no flux" , "sky" or "source" ) . It uses this 
map later on to identify blank sky pixels (see Sec. l3.4p . 



2.3 Sky Estimation and Fitting 

The third block (D) performs the major fitting work. For every 
object in the source catalogue it prepares and runs Galfit (see 
Sec. 13. 5p . Accurate fitting analysis by Galfit requires careful 
consideration, which includes identifying the proper sky back- 
ground, identifying neighbours and providing initial parameter 
guesses to start the fitting. 
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Block B: SExtractor 



run HDR SExtractor 



identify "bad" detections 



combine tile catalogues 
(GALFIT parameters empty) 



Block C: postage stamp cutting 



cut postage stamps 



Block D: sky estimation & run GALFIT 



r 



multi-threaded execution on a single machine 
(making optimal use of all available CPUs) 



run GALFIT on brightest sources 



parallel execution of several instances of GALAPAGOS 



I 



GALFIT (batch mode) 
I 



I 



GALFIT (batch mode) 
I 



1 



I 



Block F: catalogue creation 



insert GALFIT results into catalogue 



Figure 1. Code structure. A yellow background indicates ttie four main blocks, labelled according to the nomenclature of the GALAPAGOS setup 
file (see Fig. lAlll . Fitting objects with Galfit (Block (D)) is a two-stage process (see Sec. I3.5|| . which typically requires more than 90% of the 
total computation time (green background). We mark smaller tasks by blue boxes. For further details see Sec. |2] 



Galapagos measures the sky using a flux growth curve 
including pixel rejection based on the "sky-map", which was 
calculated in the previous step (see Sec. 13. 4p . It uses the full 
survey image and not the small postage stamp to compute the 
sky. Note that even though Galfit can flt the sky, Galapagos 
does not use this option to avoid instances when neighbouring 
contamination makes accurate determination infeasible, and to 
reduce the degree of freedom in the fit. We provide further 
justification for and details on this approach in Sec. 13.3! and 
Sec. 1331 



2.4 Catalogue Creation 

In the last block (F), GALAPAGOS reads the results of the fit- 
ting from the headers of the Galfit output images and puts 
them into the source catalogue (see Sec. l3.2p . Here, it removes 
a second set of "bad" detections from the catalogue. Namely 



those that were required in the fitting process to allow opti- 
mal results for neighbouring objects. Usually, these are bright 
artefacts in close proximity to relatively faint real sources (see 
Sec. l3.6p . Finally, Galapagos compiles the resulting catalogue 
in a FITS-table. 



3 COMPONENTS 

Subsequently, we describe in detail the methods involved in 
the individual components of GALAPAGOS. These include SEx- 
tractor and high dynamic range (HDR) source extraction 
(Sec. 13. ip . compiling a combined source catalogue (Sec. 13. 2|) . 
the cutting of postage stamps (Sec. 13. 3|) . estimating a back- 
ground sky level robustly (Sec. 13. 4p . and fitting with Galfit 
(Sec. 13. 5|) . In the last part of this section we introduce some 
technical mechanisms to optimise the code for robustness and 
speed (Sec. I3.6| ). 
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Figure 2. Combining "hot" and "cold" SExTRACTOR catalogues. The two panels {upper and lower) show examples of a cold (left side) and a 
hot (right side) SExtraction. Ellipses indicate the SExTRACTOR Kron ellipses of the detected sources. Arrows mark objects from the hot (red) 
and the cold (blue) catalogue that were incorporated into the combined catalogue. Additional hot sources not marked with arrows (e.g. in the 
upper right panel) were excluded from the combined catalogue as described in detail in Sec. l3.ll 



3.1 SExtractor 

Galapagos incorporates SExtractor to detect astronomi- 
cal sources on individual survey til es. Details on how to op - 
erate SExtractor can be found in I Bert in fc Arnoutj (| 19961 ). 
SExtrac tor uses the Kron radiu s to estimate the extent of 
a galaxy l|Kronlll980l : Ilnfantelll987l ). For both stars and galax- 
ies, when convolved with a Gaussian seeing, it encircles 90% of 
their flux. Galapagos applies the Kron radius e.g. to estimate 
sizes of postage stamps or to judge which pixels in an image 
are affected by light from sources. 

SExtractor has been used successfully with both ground- 
and space-based data. Yet, recent large CCD arrays put the 
code to its limits due to the wide range of object sizes and 
luminosities that are being observed simultaneously. In classic 
pencil beam surveys, the objects of interest are mostly faint and 
small. SExtractor is then fine-tuned to pick up such sources 
properly at the cost of splitting up the occasional big bright 
spiral galaxy into many pieces. On the other hand, wide area 
surveys traditionally do not reach very deep. When fine-tuning 
SExtractor for these surveys, emphasis is put on correctly 
deblending the larger and brighter objects while losing some 
depth. In both applications one reaches the dynamic range limit 
(in terms of object size and brightness) and has to make a 
compromise of depth and proper deblending. 



HDR SExtraction 

Fortunately, there is a rather simple two-step approach using 
SExtractor to overcome this problem. Firstly, one runs SEx- 
tractor in a so-called "cold" mode in which only the brightest 
sources are picked up and properly deblended. As this will miss 
many faint sources, in a second setup emphasis is put on depth. 
The second run we term "hot" mode. 

Then one needs to combine the "hot" and the "cold" runs. 
Firstly, all "cold" sources are imported into the output cata- 
logue. Then the Kron ellipses as provided by SExtractor of 
"hot" and "cold" sources are analysed. Every source position 
in the "hot" catalogue is checked whether it falls inside a Kron 
ellipse of a "cold" source. If it lies inside a Kron ellipse it is dis- 
carded and does not enter the output catalogue; if its central 
position lies "sufficiently" outside of all "cold" Kron ellipses it 
does enter the output catalogue. "Sufficiently" here refers to 
the possibility in GALAPAGOS to artificially enlarge the Kron 
ellipses slightly for this purpose: parameter B09 provides a scal- 
ing factor. Setting B09 to e.g. 1.1 results in enlarging each Kron 
ellipse by 10%. 

In summary, it is important that the "cold" run properly 
deblends all brighter objects, while the "hot" run is tuned to 
pick up fainter sources. We term this mode "High Dynamic 
Range (HDR) SExtraction". 

To illustrate the process of including hot sources outside 
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the Kron radius of cold sources into a combined catalogue, see 
FigO In the upper left hand panel we show a "cold" run. The 
big central spiral galaxy is deblended correctly with the fainter 
galaxy below it. Also, the clumpy low surface brightness spiral 
in the upper left corner is detected as a single source. All three 
sources are taken over into the combined catalogue. Requir- 
ing a proper deblending of the bright objects results in missing 
the faintest sources, though. The "hot" run (upper right hand 
panel) picks those up. However, it breaks the brighter galaxies 
up into many sources. In the example, an off-centre knot of the 
upper left galaxy was detected as a separate object. Moreover, 
the outer regions of the central (and upper left) galaxy are as- 
signed separate source IDs. These "spurious" detections change 
the effective size of the central galaxy (compare diameters of the 
Kron ellipses in the left and right figures) . Interestingly, the rel- 
atively bright galaxy below the central object is not deblended 
properly in the hot run. Furthermore, the size and position an- 
gle of the upper left detection demonstrates the lower detection 
threshold of the hot setup. In the hot setup a larger fraction of 
the low surface brightness flux is included in the calculation of 
the position angle, thus providing a much better estimate than 
the cold setup, which is more heavily weighted towards the inner 
regions of the sources. Yet, the values from the cold run enter 
the combined catalogue as deblending is the more important 
source of error. Also, Galfit calculates structural parameters 
like the position angle much more reliably. The lower panels 
in Fig. [2] show another example. Again, the deblending in the 
hot run is bad, while in the cold run it is correct. The faintest 
sources are only detected in the hot run. Bad deblending in the 
hot run strongly affects the calculation of the position angle of 
the brightest source, while in the cold run it is acceptable. 

We developed and tested this method for the GEMS 
survev (iRix et al.l |2004| : ICaldweU erall l2008l : for tests see 
iHaussler et al.l 120071 ). Subsequently, other major surveys have 
adopted it as well, incl uding COSMOS (Kockcmocr ct al. 2007; 
iLeauthaud et al.ll2007^ and STAGES (|Grav et al.ll2008h . Gala- 
pagos provides the option for running SExtractor in two- 
stage HDR or normal single-stage SExtractor configuration. 



3.2 Catalogue Compilation 

Compiling the output source catalogue is a two-stage process. 
Galapagos creates a first combined catalogue from the SEx- 
tractor output tables. In the subsequent model fitting pro- 
cess, Galapagos fills this catalogue with the Galfit output 
parameters. 

When putting together catalogues from potentially over- 
lapping images, GALAPAGOS has to take care of removing de- 
tections of the same source on multiple images. To this end, 
it uses the world coordinate system of the images to translate 
pixel coordinates from one to another image. Next, GALAPAGOS 
calculates the distance to the image border for each source (not 
only those in the overlap area) in the corresponding image cat- 
alogues. The area containing flux (pixels with non-zero values) 
defines the image border. This is crucial in particular for non- 
rectangular images (e.g. from HST). Now, GALAPAGOS sorts the 
two catalogues by border-distance. It starts with the source far- 
thest from the edge, which we assume to be on image A (source 
la; see Fig. [S]). Then it checks whether there are sources inside 
the Kron ellipse of the current object in the neighbouring image 
B (sources lb, 2b and 3). If it finds any such targets, GALA- 
PAGOS removes them from the list. Note that GALAPAGOS does 
not remove objects overlapping with the source from image A 
from the list (sources 2a and 4). Following this scheme it works 
through the complete list, from the farthest to the closest ob- 
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Figure 3. Combining SEXTRACTOR catalogues from neighbouring 
tiles. Tile A contains sources la, 2a and 4, while objects lb, 2b and 
3 were detected on tile B. Ellipses show the corresponding sizes from 
SExtractor. For a description of what source ends up in the result- 
ing table see Sec. 13.21 



jects to the boundary, and constantly updates the list in the 
process. 

A problem arises for sources, say in image A (source la), 
extending over a radius larger than the size of the overlap area 
and having overlapping detections on image B, which are not 
covered by image A (source 3 in Fig. [3]). Or put differently, if 
source 1 is e.g. deblended differently in image A than in image 
B, sources might get lost in the combination process. In such 
a case, Galapagos includes the main source from image A 
(source la) in the catalogue and all overlapping sources from 
image A (sources 2a and 4). Overlapping sources from image B 
it removes, though (sources 2b and 3). However, in cases where 
source 1 was over-deblended in image B, but not in image A, 
this would result in a welcome clean-up of the catalogue by 
removing the spurious source 3. Although this problem cannot 
be unambiguously solved, in practice it rarely occurs. It can be 
avoided completely if the largest source in the survey is smaller 
than the overlap between survey images. 

Fig. m shows an example for this procedure to remove du- 
plicate detections. The bright galaxy a) is just on the edge of 
the red image. As only half its flux is visible on that image, 
the calculated centre is far off from the real position. The blue 
image fully contains this galaxy. The two central positions (in 
red and blue) being so different, a normal nearest neighbour 
matching algorithm with a ma:ximum matching radius would 
not have been able to identify the two detections as the same 
source. In the proposed scheme, though, the red detection is not 
put into the combined catalogue, for being inside the Kron el- 
lipse of a source that is further from the image edge in the blue 
image. Similarly, the red source b) is further from the image 
edge than the blue source and thus, we reject the blue object. 
Objects without counterpart in the other image, as in c), we do 
keep in the combined catalogue. 

As Galapagos performs duplicate removal before running 
Galfit, it does not flt sources twice. Note that for fitting 
sources at the edge of an image, GALAPAGOS takes objects on 
neighbouring survey images into account as well (see Sec. I3.6|) . 
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Figure 4. Combining SExTRACTOR catalogues from neighbouring 
tiles. The left image (blue area) extends out to the right (blue) di- 
agonal line; the right image (red area) extends out to the left (red) 
line. Shaded areas outside of the lines corresponding to the respective 
image did not receive sky flux. Pluses (blue) indicate source detec- 
tions (hot and cold already combined) from the left (blue) image; 
crosses (red) mark detections from the right (red) image. Diamonds 
(blue) highlight objects that are contained in the combined catalogue 
if they originated from the left (blue) image; boxes (red) highlight 
those that were taken from the right (red) image. Catalogue combi- 
nation is based on the SExTRACTOR ellipses (see Sec. 13.211 . Such an 
ellipse is shown in case a). The source from the right image (red) is 
rejected as it lies inside an ellipse of a detection in the left image 
(blue), which is further away from the respective image boundary 
(blue and red lines). For the same reason in case b) the red source 
is kept. The detection in case c) does not even have a counterpart in 
the other catalogue. 



3.3 Postage Stamps 

To optimise galaxy fitting with Galfit, Galapagos cuts 
the science images into smaller sections centred on individual 
sources. The advantage of using such postage stamps is that the 
total fitting time and the demand on main memory can be re- 
duced. Even rather deep optical surveys contain large fractions 
of empty sky, which can mostly be excluded from the fit once 
the information from the sky pixels is effectively used to esti- 
mate the background (see Sec l3.4l even masking cannot totally 
diminish this advantage). In a typical one-orbit HST survey 
around a factor of 2 in the total number of pixels can be saved. 
Moreover, although Galfit allows simultaneous fitting of mul- 
tiple sources, modelling more than a handful of objects at the 
same time quickly becomes rather impractical. Thus, to opti- 
mise automated fitting of large numbers of sources, GALAPAGOS 
incorporates a postage stamp cutting facility. 

To determine the size of the postage stamps, GALAPAGOS 
uses the Kron radius. The user specifies a scale factor COS by 
which the Kron radius is enlarged. The decision for this scaling 
should be guided by trying to find a compromise between max- 
imal area, to include as much flux of the central source (and 
maybe the closest neighbours) as possible, and minimal area, 
to speed up computation time of Galfit. Finding a good com- 
promise is important as elliptical galaxies require a larger area 
than spiral galaxies, owing to their extended and slowly dim- 
ming, low surface brightness wings. For the one-orbit HST sur- 



veys GEMS and STAGES, we found a factor of 2.5 to work well. 
Galapagos does not enlarge the size of the postage stamps in 
the presence of close neighbouring galaxies. However, they are 
properly taken into account in the fitting process (see Sec. l3.5|) . 
We note that a disadvantage of using postage stamps for 
fitting with the background sky as a free fit parameter (which 
we discourage the user from in the context of Galapagos) is 
that the fit results will be biased if the postage stamp does not 
contain enough empty sky pixels. In such a case the x^ of the 
fit might indicate a good fit, yet the result would be flawed 
by attributing too much or too little flux to the object. This 
could potentially also have a strong impact on other structural 
parameters. Therefore, GALAPAGOS does not allow a free fit of 
the sky background within Galfit, but estimates a value before 
the fitting. We give details on the background estimation in the 
following section. 



3.4 Sky Estimation 

Obtaining a precise sky level is the most critica l systematic in 
galaxy surface brigh tness profile fitting (see e.g. Ide JongllQQq : 
iHaussler et al.ll2007l ). To obtain a precise background measure- 
ment Galfit is capable of including the sky as a free parameter 
when fitting a celestial source. However, using the sky as a free 
parameter requires an appropriate size of the input image, i.e. 
it has to contain all the flux of the primary source and most of 
the flux of neighbouring sources that are to be fitted simultane- 
ously and ample sky. For estimating a proper sky background, 
the image should be as large as possible. However, as detailed 
above, large postage stamps become impractical once too many 
neighbouring sources are included. Only a manual setup may 
allow using the sky as a free model parameter. To enable au- 
tomated processing of large numbers of objects, GALAPAGOS 
incorporates its own subroutine to obtain an optimal sky mea- 
surement before running Galfit and hence uses a fixed value 
during fitting. With the proper setup, the resulting GALAPA- 
GOS estimate improves significantly over values obtained from 
SExTRACTOR. 

We use a fiux growth method to estimate the local sky 
around an object. Calculating the average fiux in elliptical an- 
nuli centred on the object of interest while excluding other 
sources or image defects, we obtain the background fiux as a 
function of radius. Once the slope over the last few measure- 
ments levels off, Galapagos stops and determines the sky from 
those last few annuli (see Fig. [SJ. 

For this procedure to work, we create a "sky-map", i.e. a 
copy of the input images where the pixel values indicate the 
nature of the contained fiux. In the sky-map a pixel value of 
stands for blank background sky, while positive numbers in- 
dicate the presence of a source. A value of -1 indicates no fiux 
at all, as happens with HST images that are geometrically dis- 
torted (see Fig. (SI . One might think that to make the decision 
between source or sky the SExTRACTOR segmentation map (for 
a definition see lBertin fc Arnoutslll996l ) might suffice. Unfortu- 
nately, the level out to which SExTRACTOR detects objects is 
rather limited. In particular with elliptical galaxies SExTRAC- 
TOR underestimates the flux belonging to the object signifi- 
cantly. Changing the SExTRACTOR setup parameters cannot 
totally remedy this. Therefore, a significant number of pixels 
still containing some source fiux would be assigned as "sky". 
To circumvent this problem, we instead use Kron ellipses to de- 
termine the extent of an object. GALAPAGOS regards any pixel 
inside D03x7?Kron+D05 as containing source flux (for STAGES: 
DOS = 3; D05 = 20 pix). Note that this scahng factor DOS does 
not have to coincide with the scale for the size of the postage 
stamps COS. Note also that GALAPAGOS records the total num- 
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Figure 5. Sky estimation. Left: The average flux / measured in elliptical annuli (blue) centred on an object (here a) determines the background 
level. In each annulus, we exclude regions surrounding other sources from the calculation (shaded area). For the indicated annulus, we exclude 
dark blue shaded regions b - only light blue regions c define the average background flux. Right: Flux / measured in an elliptical annulus as a 
function of radius r. a: Starting radius, b: Slope (indicated by the diagonal lines) turns positive for the flrst time, e.g. due to galactic structure 
at large radii, c: Slope turns positive for the second time. Here we stop the iteration, d: We compute slope measurements from the last n sky 
estimates (here: n=5; n is a user parameter), e; The adopted background sky level. See Sec. 13.41 for details. 




Figure 6. The "skymap" (left): for each object that was detected in the image {right) we calculate the Kron ellipse and scale it up. Pixels 
inside a Kron ellipse get the value one. Pixel values stack, e.g. where two Kron ellipses overlap, the pixel value is two. Blank sky has a value of 
zero; pixels without astronomical flux, as occurs after removing image distortions e.g. in HST images, have a value of -1. Some pixel values are 
indicated. 



ber of objects that might contribute to a certain pixel, i.e. when 
e.g. two sources overlap, the value in the intersection of the two 
Kron ellipses is also two. A weight map (exposure time map) 
specified by the user defines the off-chip pixels, which are given 
a value of -1. GALAPAGOS identifies these as pixels with zero 
exposure time. 

Galapagos takes special care to minimise the impact of 
large nearby sources on the background estimation process for 
the current object. To that end, GALAPAGOS relies on the SEx- 
TRACTOR output catalogue to provide shape information. Un- 
der the assumption that all sources have a Sersic index n = 4 
and a half-light radius re — (flux_radius)", with the SEx- 
TRACTOR catalogue parameter f lux_radius and a user specified 
power a (we chose a = 1.4; Dll) to convert the SExtractor 
flux_radius to a "true" half-light radius, GALAPAGOS calcu- 
lates the fiux of all catalogue objects at the position of the 
current source. Any source exceeding a user specified limit D09, 
Galapagos regards as an important flux contributor for the 
current object. Subsequently, we will term the sources that are 
selected that way "contributors". Note that D09 has the units 



of a magnitude, i.e. "exceeding" the given hmit implies a num- 
ber smaller than this value. As the SExtractor f lux_radius 
is a rather poor proxy for the true half-light radius and with- 
out proper estimate for the Sersic index, we opt for a rather 
conservative limit of this flux cut. 

If a proper Galfit flt exists for the contributors, GALA- 
PAGOS subtracts their model profile from the input image tem- 
porarily, i.e. for the time of the current background estima- 
tion. Note that removal of a model profile includes convolution 
with the telescope PSF before subtraction. In order to optimise 
the profile subtraction, GALAPAGOS processes the SExtractor 
source catalogue in order of increasing magnitude. As the very 
few brightest sources have a significant impact on both the sky 
estimation and fitting of a large number of fainter sources, start- 
ing the fitting process with the brightest galaxies is essential. 
We give further details about the sorting process in Sec. 13.61 

Normally, the Kron ellipse of the current object defines the 
starting radius for the iterative measurement of the sky back- 
ground in increasing annuli. In case of the presence of poten- 
tially dominant fiux contributors, for which no Galfit model 
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Figure 7. Fitting Sersic profiles with Galfit. From left to right, the panels show the original galaxy image, the Sersic model and the residual 
of image and model, respectively. GALAPAGOS excludes (masks) areas shaded in red from the fit. In this example no bright secondary sources 
were detected. The next brightest object after the primary source is too far away to become a secondary (for details on the definition of primary 
and secondary sources see Sec. 13. St . Note that the masked region at the right edge of the image results from the irregular shape of the HST 
images. This area has not received any flux and thus GALAPAGOS masks it as well. 



exists yet, and hence were not subtracted from the input im- 
age, Galapagos increases the starting radius to the maximal 
distance of all such sources from the current, as they might po- 
tentially influence the fitting. For each sky annulus, it estimates 
an average flux value excluding any pixels that were flagged as 
containing an object (or that were flagged as having a defect 
or no flux) in the skymap. Firstly, of the distribution of the re- 
maining pixels, Galapagos symmetrically clips all 3cr outliers. 
Then it fits a Gaussian function to the leftover distribution, 
producing a mean value for the current annulus. After each 
new sky annulus measurement, GALAPAGOS calculates a robust 
linear fit to the last few estimates (D13; in the case of STAGES 
15 measurements). As long as source fiux is still measurable, 
this slope is negative. Once this process reaches the true back- 
ground, the estimated slope should start to randomly change 
its sign. When this happens for the second time, GALAPAGOS 
stops the loop and obtains the final background value from the 
last D13 measurements. Stopping the process at the first posi- 
tive slope measurement often results in suboptimal estimates as 
galactic inhomogeneities (like spiral arms) might produce dips 
sufficient to produce a slope sign change. However, using a much 
later slope change (than two) in practice is not necessary. Note 
that neighbouring sources are not a problem for the termination 
of this iteration as the method takes special care to take their 
influence into account (as shown above). This whole process is 
fully user-configurable, including options for the width of the 
sky annuli D07, their spacing D06, the initial starting radius DOS 
and the magnitude cut D09. 



3.5 GALFIT 

Of the various light profiles built into Galfit, the most general 
one for galaxy fitting is the Sersic model. It is also used by 
Galapagos: 



E (i?) = Ee ■ exp (-K [(i?/7?e 



l/n 



1]) 



(1) 



where Re. is the effective or half-light radius, Ee is the effective 
surface brightness, E (i?) is the surface brightness as a function 
of radius i?, n is the Sersic index and k — n (n) is a normalisa- 
tion constant. The Sersic profile is a generalisation of a de Vau- 
couleurs profile with variable Sersic index n. An exponential 
profile has n — 1 while a de Vaucouleurs profile has n = 4. 



A simple setup script controls profile modelling with Gal- 
fit. It contains information about input and output file lo- 
cations, PSF image, bad pixel mask, etc. A list of starting 
guesses defines what light-profiles are to be fitted. Although 
the downhill gradient method incorporated in Galfit is often 
speculated to be prone to converging to a local instead of the 
global minimum, in practice we find it to be extremely robust, 
even in comparison to global parameter space search algorithms 
(jHausslcr ct al. 2007). In application to high redshift survey 
data, the other two noteworthy features are the included bad 
pixel mask (i.e. pixels that are excluded from the fitting) and 
the number of simultaneously fitted objects. 



We show an example of Galfit output in Fig. [T] The left 
image presents the input postage stamp. In this case a single 
component (one object) was fitted. We show the resulting Sersic 
model in the middle. Note that the brightness cuts and scaling 
in both left and middle panels are the same. The right panel 
displays the difference image of input minus model. Bright spi- 
ral features and dark dust lanes that strongly deviate from the 
smooth Sersic profile are very prominent in this image. In or- 
der not to bias the fit by neighbouring sources and the image 
boundaries, GALAPAGOS excludes the shaded region from the 
fit by applying a bad pixel mask (see below). 



In order to define which objects do not have a high impor- 
tance for the current fit and hence may be masked instead of 
being fitted simultaneously, we define the following terminology: 
the target for the current fit is the primary source; any object 
whose expanded Kron ellipse overlaps with that of the primary 
are secondary sources; objects without any overlap with the 
primary we term tertiary sources. We consider tertiary sources 
not to be important for the quality of the fit. As a result, we 
mask and exclude them from the modelling (each pixel in a 
mask image is ignored during the fit by Galfit). Secondary 
sources might have an impact on the outcome of the parame- 
ters of the primary. Therefore, we fit them simultaneously with 
the primary. We treat contributors (for a definition see Sec. l3.4p 
as secondaries. The difference between contributors and secon- 
daries is, that we do not require an overlap of the Kron ellipses 
for contributors. 
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Figure 9. Galfit parameter setup scheme for secondaries and contributors. Depending on the relative position to the primary target, i.e. on 
the same postage stamp or not, with a pre-existing fit from the same or another survey image or without a fit, we show the setup for the 
Galfit parameters {left panel): static implies all parameters are fixed to their initial guess (i.e. the SExTRACTOR estimates), while free means 
that they are variable throughout the fit. In some cases the position pos, the axis ratio q or the position angle 9 take a different state than the 
remaining fit parameters. We visualise the situation in the right panel: The current primary P is located on the red survey image A, which has 
some overlap (purple) with the blue image B. The solid black outline indicates the postage stamp corresponding to P. Potential secondaries or 
contributors are numbered. For sources with a black background {3 &; 4) no prior fit exists (SExTRACTOR values are used as static/fixed profile 
parameters), while for targets shown either in red (1) or blue colour (i & 2) a fit from the respective survey image is available. For further 
details see Sec. 13.51 




Figure 8. Optimisation of the Galfit setup. Circles indicate the 
Kron ellipses used for classifying the detected objects (as secondaries 
or tertiaries). This example was taken from real data. However, for 
clarity we do not mark the faintest detections in this image. For 
details see Sec. 13.51 



Simultaneous Fits 

Often, sources are so close to each other, that they are best fit- 
ted simultaneously. One might argue that after a simultaneous 
fit of two sources, A and B, the best parameters are known for 
both objects. However, in general this is not true. For example, 
we construct a situation with three sources A, B and C, where 
C is on the opposite side of A with B being in the middle (see 
Fig. Is}. Let A be the brightest source of the three, i.e. A is 
fitted firstly (see also Sec. 13. 6p . The Kron ellipses of A and B 
and those of B and C overlap, while the Kron ellipses of A and 
C do not. Fitting of A implies fitting of B simultaneously: B 
is a secondary to the primary A. C is not connected to A and 
therefore following our prescription we mask it. As a tertiary 
we exclude it from the fit. The resulting fit for B thus is not 
optimal, as it neglects the presence of C, which is important 
for fitting B, but not for fitting A. To obtain the optimal fit 
for B, we have to fit S as a primary. In this case A and C are 
secondaries as their Kron ellipses both overlap with B, and are 
fitted simultaneously. To speed up the fitting of B, we can now 
insert the known parameters for object A, thus effectively re- 
moving one component from the fit. This example highlights 
the importance of fitting all objects once as primaries, while 
secondaries may be made static if a fit already exists. 

Normally secondary sources are fitted simultaneously with 



the current primary object (see above). Using a pre-existing fit 
(as in the example) as static parameters for a secondary source, 
thus, is an exception to this rule. A further complication is that 
the existing fit for the secondary may have been obtained from 
a different survey image as the current primary. In that case, 
the central position of the secondary has to be converted via 
the world coordinate system information from the original pixel 
coordinates to the current system of the primary. Therefore, to 
allow optimal centring after such a conversion, GALAPAGOS fixes 
all parameters for the secondary, but its central coordinates. 
If in the previous example, when fitting object B with a pre- 
existing fit of source A, the fit for A was performed on a different 
survey image than B, then the pixel centre of A would not 
be static. However, a free pixel centre is only required if the 
centre of the secondary A is also inside the postage stamp of 
the current primary B. If the centre is off the postage stamp, 
sub-pixel accuracy is not required any more for an optimal fit, 
and all components of A are made static. We visualise this 
situation in Fig. [5] (case 1 and 2). 

Furthermore, if no fit exists for a secondary, a free fit for 
that source is not always the best solution. In the case that 
the centre of the secondary is not on the postage stamp, a free 
fit results in too many degrees of freedom. In GALAPAGOS we 
opt to then fix the position, axis ratio and position angle to 
the values provided by SExtractor (while leaving the Sersic 
index n and the half-light radius Re as free parameters; see 
Fig.[9]case 3 and 4)- This is justified because, on one hand, more 
than half the flux of the secondary cannot be seen by Galfit, 
thus making it increasingly difficult to come up with precise 
estimates for these parameters. On the other hand, the values 
given by SExTRACTOR usually have high enough accuracy not 
to bias the fit of the primary significantly. 

In addition to the "normal" sources (secondaries and ter- 
tiaries) in the immediate surroundings of the current object, 
Galapagos has to take bright and large contributors as de- 
fined in Sec. 13.41 into account as well, although these sources 
may be off the current survey image. It treats them as secon- 
daries without the requirement of their Kron ellipse to overlap 
with the Kron ellipse of the primary. In terms of the parameter 
setup, Galapagos handles them exactly like other secondaries. 
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Figure 10. Mask creation. The left panels show galaxy images; the right panels the corresponding bad pixel masks. In the bad pixel masks 
white and red represent good and bad pixels, respectively. Upper panels: a) and b) indicate primary and secondary sources, respectively, c) and 
d) mark examples of tertiary sources: c) is only partly masked, as it overlaps with a secondary source; d) is masked completely for not having 
any overlap with the primary or a secondary. Lower panels: The plotted area shows a postage stamp (indicated by the solid rectangles) and 
some of its surroundings. Note that the postage stamp is tilted (representation in world and not pixel coordinates) and that the blue area is 
actually not part of the postage stamp. 1) is a source that might potentially contribute to the fit of the primary, due to its brightness, and is 
included as a secondary source (with parameters fixed from a previous fit) although its centre is off the current postage stamp. 2) is a tertiary 
source without overlap with the primary and being too faint to contribute significantly, and is therefore masked completely (red pixels inside 
the postage stamp). 



Bad Pixel Masks 



Galf it supports so-called bad pixel masks (see IPeng et al.l 
120021 ) to exclude image regions from fitting and thus speed up 
the fitting process. As tertiary sources may overlap with sec- 
ondaries, we take the following approach to define the area to 
be masked. In general, GALAPAGOS masks the full Kron ellipse, 
enlarged by a user-specified factor D04 (which may have a dif- 
ferent value as the one used for computing the skymap DOS) 
and an additional offset DOS, for the fitting. If the Kron ellipse 
of the tertiary overlaps with the Kron ellipse of the secondary, 
Galapagos includes the intersection in the fit. However, as 
the included area might contain significant fiux (maybe even 
the nucleus) of the tertiary, it excludes any pixel marked in the 
SExTRACTOR segmentation map as belonging to the tertiary. 
Thus, the resulting shape of the mask may look complicated, 
yet this procedure ensures having the fit of the secondary tar- 
gets only mildly affected. The primary source should not be 
significantly affected at all. 

To speed up the fitting process by reducing the number of 
simultaneous fits, GALAPAGOS masks secondary objects based 
on a magnitude criterion D16 (for extended and D17 for point 
sources) in comparison to the primary source. In the case that 
they are too faint compared to the primary, GALAPAGOS "down- 
grades" them to tertiary status and treats them as such, i.e. it 
masks their Kron ellipses completely, but for parts which over- 
lap with other secondaries or the primary that are not covered 
by the SExtractor segmentation map. 

Galapagos also masks pixels that have a value of zero in 
the weight map, i.e. an exposure time of zero. Obeying these 
rules results in masks as shown in Fig. IIUI 



Parameter Constraints 

Galfit not only applies a bad pixel mask, but also allows 
fit parameters to be constrained in various ways. Examples 
are keeping a parameter within an acceptable absolute range 
(e.g. Sersic indices should satisfy 0.5 < n < 8) or a relative 
range depending on the given input values. Parameters 
might even be constrained with respect to each other or 
other components. For more details see the GALFIT home- 
page Ihttp : //use rs . obs . CcLrnegiescience . edu/peng/workZ] 
galf it/galf it .html] 

With respect to Sersic fitting in GALAPAGOS providing a 
suitable range for the Sersic index n and the half-light radius 
Re has a stabilising effect on the procedure. To this end in 
Galapagos a limit on the relative difference between Galfit 
and SExtractor magnitude is imposed as well. Galapagos 
incorporates global constraints on the Sersic index n (0.2 < 
n < 8.0), the half-light radius Re (0.3 < R^ < Ell) and the fit 
magnitude m (E12 < moALFiT — msExtractor < E13). 



3.6 Computational Optimisation 

In the following section we will describe additional characteris- 
tics of Galapagos that increase the efficiency and robustness 
of the code. 



Sorting and Parallel Computation 

After running SExtractor and cutting postage stamps, Gal- 
fit fits the individual catalogue sources. Because efficient re- 
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moval of brighter sources is needed for accurate estimation of 
the sky background, an ordered processing is required. This is 
extremely inefficient in terms of total CPU time, we have devel- 
oped methods to speed up this sequential process. In the next 
paragraphs we will describe the mechanisms that are incorpo- 
rated into Galapagos to switch from sequential to parallel 
processing and to increase the overall efficiency and robustness 
of the code. 

To optimise the execution time, GALAPAGOS performs fit- 
ting in a rank-ordered sequence starting with the brightest 
source in the survey and progressing to the fainter ones. The 
advantage of this procedure is twofold: 

• Faint neighbours of bright sources do not have to be included 
in a simultaneous fit (as a second component), as they do not 
infiuence the resulting fit parameters of the bright object sig- 
nificantly. The magnitude difference between faint and bright 
neighbours is a free user parameter (D16, D17 if the primary is 
a galaxy or a star, respectively). 

• When a faint source has a brighter neighbour, which has to 
be included in the fit as well, parameters for that object will 
already exist from a previous fit. Hence the variables for that 
component can safely be held fixed to the best values. This re- 
duces the total number of degrees of freedom and increases the 
computation speed for a large number of sources tremendously. 
Another reason in favour of sorting objects by magnitude is 
that the efficiency with which bright contributors are included 
in the current fit is greatly enhanced. 

The weakness of the sequential approach is that it voids 
the speed benefits of parallel processing. To alleviate this prob- 
lem, we devise two methods: 

a) Consecutive sources in a rank-ordered list are usually suffi- 
ciently far apart to not affect one another (the average object 
size is much smaller than their typical distance). Therefore, 
Galapagos starts the next object in the sequence as a new 
process on another CPU (core), given that its distance from 
other sources in the queue is large enough (D20). The extent of 
the brightest object in the survey determines this distance and 
it should exceed the limit out to which this object might have 
an influence on the fltting of neighbours. If the next source in 
the queue is too close to objects currently being executed, the 
code waits for these objects to finish. 

b) Generally it is possible to parallelise the analysis by run- 
ning the code on one survey image only, at a time, by encap- 
sulating the sky fitting and Galfit processing. This will then 
enable the user to run several instances of the code simultane- 
ously on n different computers, thus reducing total computation 
time by a factor of n. This is realised in GALAPAGOS by speci- 
fying which tiles are to be processed in a so-called "batch file" 
(EOl). The problem with this approach is that sources may ex- 
tend from one survey image onto the next. Therefore, one might 
run into the situation where tile A is fitted before tile B, with 
the brightest source in the two tiles being on B and reaching 
into A. In this case, a fit for the brightest object is not avail- 
able for estimating the optimal sky background for a number of 
galaxies on tile A. The underlying idea of this method is that 
the average object size is much smaller than the size of a survey 
image. 

These two approaches a) and b) are implemented in the 
code as follows. Sersic fitting with GALAPAGOS is divided into 
two parts: 

In the first part (see Fig.[T]upper section of block D), GALA- 
PAGOS treats a fraction of all sources on all tiles in a sorted order 
as laid out in method a) . This assures that the brightest galaxy 
from tile B is fitted before GALAPAGOS treats tile A or B. This 
part still requires sequential processing without the possibility 
to run other instances of GALAPAGOS at the same time. Also, it 




Figure 11. Definition of neighbouring tiles. For each survey image 
the n closest neighbours define its immediate neighbourhood. In a 
checker-board configuration n = 8 corresponds to a 3 X 3 pattern. At 
the edge of the survey more distant tiles are included. The Ught green 
shaded areas b indicate the neighbourhoods for the central tiles a. 



produces a rather large computational overhead, as potentially 
with every new source a number of large images (the complete 
science image, weight image, segmentation map, etc. - not the 
postage stamps) are to be loaded into memory and processed 
(for fitting the sky background). A possible working definition 
for the fraction of sources that have to be fitted sequentially 
might encompass all sources that span an area larger than the 
size of the overlaps resulting from the survey's tiling scheme 
(D12). This stage requires that all CPUs must be able to see the 
whole dataset, i.e. they have to have access to the same hard- 
discs, because several threads are interacting with each other 
and working on the same data. 

The second part (see Fig. [T] lower section of block D) is 
kept as detailed above in method b): GALAPAGOS processes all 
objects within a tile in order of decreasing brightness. Several 
instances of GALAPAGOS may be run simultaneously on different 
tiles. With the sources that potentially reach into neighbouring 
tiles already processed in the first part, now survey images may 
be treated as individual entities, which can be processed out of 
order and simultaneously. At this stage, one might think that, 
as the tiles are decoupled, only a single tile is accessed at a 
time. However, in the presence of big, bright sources that affect 
neighbouring tiles, this is not the case any nrore. Therefore, even 
when fitting individual tiles in parallel the whole data set must 
be accessible. As now only the information for the current tile is 
changed, the fitting may be distributed to different harddiscs, 
though (by creating identical copies). 



Neighbouring Tiles 

During the sky background estimation GALAPAGOS calculates 
the inffuence of all objects on the currently processed source. 
Depending on the size of the survey, this check for contributors 
takes up a significant fraction of the complete source loop com- 
putation time. However, the sources immediately required for 
processing the current object are only the ones that may have 
an impact on the fitting or background estimation. Therefore, 
by specifying the "reach" of the brightest sources allows to re- 
strict the computation to a much smaller fraction of all sources. 
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Figure 12. Correction of detection errors. Red crosses indicate "crit- 
ical" detections; blue boxes mark "catalogue" detections; green cir- 
cles show "good" source detections. For a definition of these terms 
see Sec. 13.61 a: The diffraction spike of the star was picked up as 
multiple individual sources. Most of them are critical failures. Yet, 
one detection in each spike was kept as a catalogue source, in order 
to guarantee that the fitting of nearby objects is not biased. 6; Pixel 
bleeding from the star. Again, some detections were flagged critical, 
others as catalogue sources, c; An over-deblended object. The excess 
detections are critical errors, d: Spurious detections in the vicinity of 
the bright star. All are critical failures. Note that the categorisation 
of sources is an optional, subjective process, which is not performed 
automatically by GALAPAGOS. 



example for these are diffraction spikes of stars, which may 
not be included in the PSF model. Therefore, Galfit may not 
properly fit a galaxy close to such a spike, as too much flux is 
in the spike compared to the source. Common are also satellite 
trails or pixel column bleeding of saturated stars. We show some 
examples in Fig. [12] a and b. 

Galapagos can optionally take care of both these fail- 
ures. If the user provides a (manually created) list of positions 
for critical and/or catalogue failures (one file each), GALAPA- 
GOS will remove any source found within a specified radius 
B16 around these positions from the catalogues at the proper 
stage in the process. Otherwise, GALAPAGOS treats them sim- 
ply as normal sources and provides Sersic fits for them. A 
cleaner -although potentially somewhat more time consuming- 
approach would be to remove problematic regions from the data 
altogether (e.g. by replacing with white noise). 

To classify unwanted detections (into one of the two cat- 
egories), the user should decide whether an object is required 
for obtaining a proper fit with Galfit for neighbouring "real" 
sources, or not. In principle it is save to put any detection error 
into the catalogue failure list. This might lead to prolonged fit- 
ting times, though. In practice, most detection errors are faint 
enough to not influence neighbours and should therefore be put 
into the list of critical failures. 

Note that the definition of whether an object is a critical or 
catalogue failure is subjective and depends on the user. How- 
ever, the correction of these errors is an option. GALAPAGOS 
will run perfectly well without any manual treatment. In that 
case, the user will have to live with the fact that some (small) 
fraction of sources might be affected by this. 



This is done by providing the total number of closest tiles n that 
are to be included in the calculation (D18). If the tiles are taken 
on a regular grid, n — S defines a ring surrounding the tile of 
the current object (see Fig. lll|l . Note that in case of a tile at the 
edge of the survey this "ring" is not cut in half, but all tiles are 
selected on just one side. GALAPAGOS always selects the nearest 
n neighbours. It calculates the distance between tiles from the 
centres of the images. 

Detection Flags 

A perfect setup for S Extractor never exists. In a small frac- 
tion of all detections one or the other failure occurs, e.g. (over- 
) deblending, non-detection, spurious detection, etc. In par- 
ticular in the surroundings of bright stars (or even galaxies) 
these errors accumulate. With respect to setting up Galfit 
properly, there are two classes of failures: the "critical" and 
the "catalogue" failures. Depending on their relative bright- 
ness compared to nearby "real" objects they either have to be 
removed before the fitting (faint sources; "critical") or after 
(bright sources; "catalogue"). 

A critical failure is a detection error that should be cor- 
rected before running Galfit. Critical detections do not af- 
fect the fitting of neighbouring real sources. Examples are over- 
deblends, cosmic rays or a bad detection at the image edge. 
Critical failures include any unwanted detection that might er- 
roneously include additional unnecessary components in the fit- 
ting of real objects. We give an example for an over-deblended 
source in Fig. [TS] c and indicate several spurious detections in 
Fig. [12] d. 

In contrast, catalogue failures are detections that one has 
to remove after running Galfit. They are bright in relation to 
neighbouring sources and they might affect the fitting of nearby 
objects if not included as separate components. Typically, they 
are connected to cosmetic "defects" of the image. A common 



Treatment of Stars 

A problem related to fitting bright saturated stars is that they 
are often much brighter than the stars that one can use as 
a PSF model. Because there is a limited dynamic range, the 
PSF cannot adequately capture the tails seen around brighter 
stars, which may then contaminate neighbouring galaxies. To 
deal with this situation, we fit Sersic models to stars instead of 
the usual PSF model, because a high Sersic index produces a 
model with extended tails. However, in so doing, it may cause 
Galfit to not converge within a reasonable amount of time. 
As the focus of Galapagos is on modelling the properties of 
galaxies, no further attempt was made to apply a different, more 
elaborate model (instead of the Sersic profile) . 

To resolve the resulting problem with the convergence 
of Galfit Galapagos identifies saturated stars in the 
magnitude-size plane, which is represented by the SExtrac- 
TOR parameters mag_best and log (fwhm_image) (see Fig. I13|l . 
The user specifies the zeropoint D15 and slope D14 of a line 
below which Galapagos treats objects as saturated stars (i.e. 
on the bright and compact/small side). The reason for many 
of the brighter stars to fail in the fitting is the detection of 
a large number of neighbouring secondary sources (including 
stellar diffraction spikes), which have to be modelled simultane- 
ously. To reduce the number of these secondaries the user spec- 
ifies a relative magnitude cut D17, below which secondaries are 
not fitted any more and treated as tertiaries. For the STAGES 
data, all objects more than two magnitudes fainter than the 
primary star (mstar — JWobjcct > 2) were subject to this. Note 
that for galaxies the same limitation applies, but at a much 
weaker level. Again a magnitude limit D16 may be specifled 
(e.g. nigaiaxy " »Tiobjcct > 5). Restricting the number of secon- 
daries to those objects bright enough to influence the flt and 
removing the fainter ones resolves the issue. 
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Figure 14. Parameter recovery as a function of simulated mean surface brightness fisim within Re {left panel) and simulated magnitude msim 
{right panel) for two different Sersic index ranges (disc-like galaxies with n si 1 on the left-hand side, early-type galaxies with n si 4 on the 
right-hand side). Grey levels show galaxy density, with each bin being normalised to its own peak value. As a result, grey levels roughly resemble 
a mean value and a measure for the scatter of the distribution. Due to an asymmetric distribution and different binning, the true mean value 
(black line) deviates slightly from the peak values for fainter galaxies. The l-cr scatter of the distributions is shown as well (dashed lines). The 
light grey line indicates the ideal zero-level. Fainter galaxies (both as function of magnitude and surface brightness) and galaxies with higher n 
are fitted less accurately. Also, for the brightest galaxies in the sample, the deviation increases. Most likely their brightness (and size) makes 
them the most difficult objects to setup for fitting, because of a large number of simultaneously fitted neighbours and because of having the 
highest uncertainty in their background sky estimate. 
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Figure 15. Sky recovery (flux difference in counts) as a function of simulated galaxy Sersic index {left panel) and simulated magnitude {right 
panel). Contours and black lines show the distribution, mean and cr of the estimated sky as recovered by GALAPAGOS, white/grey dashed lines 
indicate mean and a as provided by SExTRACTOR. GALAPAGOS recovers a very accurate sky value independent of galaxy structure, whereas 
SExTRACTOR overall exhibits a much larger offset, scatter and dependence on galaxy morphology. At the brightest magnitudes a slight trend is 
seen in GALAPAGOS while SExTRACTOR performs worse by a factor of ~50. Note that the right panel shows only objects with 3 < n < 5, and 
thus portrays a rather conservative scenario. 



4 DATA QUALITY 

We have tested Galapagos thor ouKhly using simulated data 
as described in more detail in iHaussler et al.l l|2007l ) and 
iGrav et al.l ([2003) • For the simulations applied here, we use the 
same setup as for fitting the STAGES survey. Analytical Sersic 
profiles are randomly placed on a background image composed 
from patches of blank sky from real data. The galaxy models are 
convolved with the same PSF as the original STAGES data (be- 
fore placing them on the background). Also Poisson noise was 



added to each pixel of the galaxy models. The galaxy model 
parameters randomly cover the same parameter space as the 
original STAGES data with an extension to towards low fluxes 
and surface brightnesses, such as to cover the full completeness 
space. 

All in all, the simulated datasets contain around 7 million 
galaxies. Excluding the ones that are not recovered by GALA- 
PAGOS for being below the detection threshold (3 million) and 
the ones that ran into any given fitting constraint (^^280 000; 
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Figure 13. Treatment of saturated stars. Here we show source detec- 
tions from the STAGES survey in the log (f whiii_image) v s. mag.best 
plane . Red pluses mark objects identified as stars (see iGrav et al.l 
120081) . The line indicates the cut used to identify saturated stars (left 
of the line). Black dots show other (extragalactic) sources. 



the following constraints for the Sersic index n, the half-light 
radius Re and the magnitude m were applied: 0.2 < n < 8, 
0.3 < Re [pix] < 750, ImoALFiT - msExtractorl < 5) or where 
the fit crashed (293), leaves us with around 3.7 million success- 
fully fitted galaxies. 

The left panel of Fig. [14] shows the deviations of the three 
most important fitting parameters magnitude m, effective ra- 
dius Re and Sersic index n as a function of simulated mean 
surface brightness ^sim within Re for two different regimes of 
Sersic index. We choose the samples such that the complete- 
ness as a function of magnitude is roughly 90% for all galax- 
ies. The low Sersic index sample {msim < 24.5) contains ~1.1 
million galaxies, the high Sersic sample (msim < 25.25) con- 
tains ~470 000 galaxies. Obviously, Galapagos' performance 
decreases at faint magnitudes and high Sersic indices. The right 
panel of Fig. [14] shows the same plot, but as a function of sim- 
ulated magnitude msim rather than surface brightness to illus- 
trate the same effects in another commonly used parameter 
space. Again, we choose a cut to select only galaxies with a 
surface brightness completeness exceeding 90%. The low Sersic 
index sample (/isim < 22.25) contains ~780 000 galaxies, the 
high Sersic sample {fisim < 23) contains ~295 000 galaxies. At 
the faint end, quite expectedly, the recoverability of parame- 
ters gets worse. In both panels of Fig. 1141 we see no significant 
systematic trends apart from the faintest levels. 

The left panel in Fig. [TS] shows the deviation of the sky 
value (as recovered by Galapagos) from the true sky value (as 
derived from the empty noise image used for the galaxy simu- 
lations) as a function of the simulated Sersic index n of the pri- 
mary object. Obviously, the recovery of the sky in GALAPAGOS 
is completely independent of n. Compared to the SExtractor 
value for the local sky, which shows both a much bigger offset 
and a larger standard deviation, the recovery is close to ideal 
with very small offset and scatter. We derive the true sky value 
for this plot from simple statistics on an empty noise image 
used for the simulations. 

Furthermore, we investigate the magnitude dependence of 
the sky recoverability (see right panel of Fig. I15|l . Here we se- 
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Figure 16. Parameter deviations as a function of both distance 
{left panel) and magnitude (right panel) of the nearest neighbour- 
ing galaxy. The thick grey/white dashed lines indicate the zero-level; 
black solid and dashed lines show the mean and a of the recovered 
parameter, respectively. Grey contours represent the normalised dis- 
tribution of recovered parameter values. 



lect only objects with a Sersic index 3 < n < 5 (~300 000 ob- 
jects) , which due to their extended low surface brightness wings 
are hardest to fit and estimate a background value. Thus, we 
portray a conservative worst case scenario. While for the large 
majority of all objects there is no trend to be seen at all, at the 
bright end the estimates provided by GALAPAGOS do diverge 
slightly: at m = 17, 18 the mean sky moves off by ~0. 04/0. 03 
with a scatter of ~0.02, respectively. For comparison, the values 
recovered by SExtractor are ~2.3/1.3 with a scatter of ~0.4 
at the same brightness. 

To examine the influenc e of neighbouring gala xies in a sim- 
ilar fashion as was shown in JHaussler et al.l (|2007|), we plot pa- 
rameter deviations over both magnitude of and distance from 
the next neighbour. The next neighbour we here define as the 
closest simulated galaxy that was found by SExtractor. This 
does not necessarily imply that this galaxy had to be properly 
deblended and simultaneously fitted when running Galfit (i.e. 
assuming a rather conservative definition resulting in a worst 
case scenario). We show these dev i ation s in Fig. 1161 In contrast 
to the analysis in iHaussler et al.l ([2007|) , we now have enough 
statistical significance to separate both effects. We only show 
neighbours with 21 < msim < 23 (~330 000 galaxies) in the 
left panel and neighbours with a distance 1 < d [arcsec] < 1.6 
(~280 000 galaxies) in the right panel of Fig. [Tn]to not confuse 
the two distinct effects: contamination by bright neighbours and 
contamination by close neighbours. As one can see from these 
plots, Galapagos results do not show any dependence on either 
of these parameters. From this plot we conclude that the de- 
blending and fitting scheme applied in GALAPAGOS works well 
and successful deblending of clustered fields (as e.g. STAGES) 
is possible. 



5 PERFORMANCE 

We measure the performance of GALAPAGO S when a pplying it 
to the single-orbit FST survey STAGES (see JGrav et al. 2008). 
STAGES is a mosaic composed of 80 tiles in the F606W fil- 
ter containing ~75 000 sources. The survey being centred on a 
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Figure 17. Performance of the galaxy modelling with Galfit. Cu- 
mulative histogram of the fitting time per object as a fraction of the 
total fitting time. The two histograms show all galaxies (black/left) 
and the brightest 5% (green/right). 50% of all sources take less than 
1.25 min to fit and more than 90% of all objects are done within 
10 min (red lines). The brightest 5% take about a factor of 5 longer. 



nearby galaxy cluster system at redshift z ~0.16, it provides 
the ideal test case including a high fraction of large and also 
peculiar objects. Large objects serve as a test for the deblend- 
ing process during the source extraction, while peculiar objects 
like mergers or saturated stars with diffraction spikes pose a 
challenge for the modelling with Galfit. The total wall-clock 
time for processing this survey is ~430 hours. Details are given 
below. 

The fitting process with Galfit is the main limitation for 
Galapagos ' performance. The largest amount of time was 
spent on fitting the fainter 95% of all sources in the parallel 
mode (second part of block (D) in Fig.[T]). Using eight 2.2 GHz 
CPU cores in parallel, this process (i.e. the slowest of the eight) 
takes ^^260 hours. There is potential for further improvement 
by increasing the total number of CPU cores. This would also 
reduce the overhead resulting from individual pipelines not fin- 
ishing at the same time (i.e. pipelines with fewer sources finish 
sooner), resulting normally in much less than the total num- 
ber of available CPUs running simultaneously at the end of the 
fitting. 

For the first part of block (D) , the fitting of the brightest 
5% of all sources we used four 2.4 GHz CPU cores. This part of 
the fitting takes ~150 hours. Note that moving from four CPU 
cores to eight does not necessarily imply halving the required 
computation time. The performance increase at this stage de- 
pends on the survey geometry. A wide area survey has a higher 
efficiency than a smaller survey of the same depth, because of 
the higher probability that the brightest objects in the survey 
are further apart from each other, thus allowing a higher multi- 
plicity. Fig. 1 171 shows a cumulative histogram of the fitting time 
per object. Note that the brightest objects take considerably 
longer to fit than the rest thus explaining the necessity to find 
a good compromise between the time spent in the two stages. 

The remaining blocks take up an almost negligible fraction 
of the total processing time. Block (B) , the SExtractor stage, 
takes ~13.5 hours, including HDR mode. Cutting the postage 



stamps in block (C) requires ~2.5 hours and the last block (F), 
compilation of the output catalogue, finishes within ^^.1 hours. 
Note that overheads for adjusting the setup and prepar- 
ing the parallel fitting is not taken into account in the numbers 
cited above. Also, for varying survey layouts/configurations rel- 
ative fractions of the total processing times between the various 
stages might vary significantly. 



6 SUMMARY 

We present GALAPAGOS, a software for automating the pro- 
cess of detecting sources and modelling them with single Sersic 
profiles. Galapagos incorporates SExtractor and Galfit to 
perform these two tasks. In addition, it provides HDR source 
extraction, a postage stamp cutting facility and a robust means 
of estimating a local sky background. It stores results in a com- 
bined FITS table, excluding duplicates resulting from detec- 
tions in overlapping tiles. We optimised the code for speed and 
stability, making use of modern multi-core CPUs and allowing 
a high degree of multiplicity. Another aim was to present the 
user with a simple setup, yet enabling control over all features 
of the code. As a result, Galapagos can be used on a wide vari- 
ety of survey applications, from single tile deep observations to 
wide area shallow surveys. Galfit's ability to work with any 
given PSF enables application of GALAPAGOS to both space- 
and ground-based data. The PSF has to be prepared by the 
user before running GALAPAGOS, though. This procedure is not 
part of the code. 

We tested GALAPAGOS on an extensive set of simulations 
and find it to be extremely robust in terms of parameter re- 
coverability. Note that the results of the fitting depend on the 
choice of the input parameters. For example, a bad SExtrac- 
tor setup will have a significant impact on the fitting procedure 
and thus lower the quality of the output catalogue. 

The main feature that will be implemented in GALAPAGOS 
in the near future is the option for a consistent two-component 
bulge-disc fitting. This will also include an estimator providing 
information about whether the increased amount of data po- 
tentially allows further insight into the structural composition 
of the object or not. Based on this idea, we will also investigate 
the automated fitting of bars and the application of Fourier 
mode fitting, built into the most recent version of Galfit. 

Another potential aspect for increasing the versatility of 
Galapagos could be the implementation of a variable PSF. 
Currently, just one PSF is used for convolving the Galfit 
model profiles for the whole survey. Instead, one could allow 
using a different PSF depending on the position on the tile or 
even varying tile by tile. 

Galapagos is freely available for download from our web- 
page at: |http : //astro . uibk . ac . at/~m. barden/GALAPAGOS/| 
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APPENDIX A: CODE SETUP AND CONTROL 

In the following section we provide detailed information about 
how Galapagos is run. This includes a description of the struc- 
ture of setup files and the execution sequence. 



Al The Setup Script 

Galapagos is controlled by a set of scripts. We show an ex- 
ample for the main startup script in Fig. lAll It contains all 
references for file locations and manages the programme execu- 
tion. The startup script is divided into six parts, (A) through 
(F) , closely related to the four blocks described in Sec. [51 The 
first set of parameters (A) defines the input and output file loca- 
tions. Section (E) contains options that help setting up Galfit. 
These two parameter sets are "static"; the remaining four are 
"dynamic" in the sense that these can be activated or skipped 
when executing Galapagos. They correspond to the four pro- 
gramme blocks that were previously defined in Sec. [5] Parame- 
ter set (B) starts SExtractor; (C) is responsible for defining 
and cutting the postage stamps; (D) performs the estimation of 
the sky background, prepares Galfit and starts the fitting; fi- 
nally parameter set (F) reads out the fit results and creates the 
output catalogue. The difference between the "static" and "dy- 
namic" parameter sets is, that the latter control code execution 
while the prior only define file locations and setup parameters. 
Note that the setup files for SExtractor B03 (and option- 
ally B06 in HDR-mode) require additional files to be accessible. 
These are the neural network file "default. nnw" and an op- 
tional convolution filter, e.g. "toph at_3. 0-3x3. conv". For det ails 
on how to setup SExtractor see lBertin fc Arnoutd (|l996h . 



A2 The File List 

A file location list, or short "file list" AOO, provides the informa- 
tion required for defining the organisation of the survey. It is an 
ASCII file providing for each survey pointing it the file location 
of the actual tile, the corresponding weight image (which is an 



exposure time map), a path for storing the fitting output of the 
individual tiles, and a prefix, which is attached to all output 
files. We give an example of such a file list in Fig. IA2I for a 
hypothetical survey with 10 science tiles. 



A3 Catalogue Finetuning 

In order to refine the output of SExtractor, the user has the 
option to remove sources. After an initial run with the optimal 
SExtractor setup, the user may create a list that contains 
the file name of the respective tile together with an x/y-pixel 
position, e.g.: 

/path/to/survey/tileOl.fits 234 567 
/path/to/survey/tileOl.fits 765 432 
/path/to/survey/tile02.fits 453 678 



Galapagos rejects any detection within a certain radius 
B16 automatically from the SExtractor catalogue on a sub- 
sequent run of the code. Thus, if one wants to refine the cat- 
alogue, the SExtractor section of the code has to be run 
twice, i.e. Galapagos needs to be started first with only the 
SExtractor section activated and then run a second time 
with the SExtractor section and optionally others enabled 
as well. The first execution is required for identifying bad de- 
tections; the second run then treats them. For details on what 
sources should be removed and how the code deals with them 
see Sec. 13.61 



A4 Batch Processing 



In Sec. 13.61 we explain in some depth the mechanisms to op- 
timise the total programme execution time. According to this 
scheme, after cutting the postage stamps and fitting a subset of 
all sources, namely the brightest objects, GALAPAGOS may be 
run in parallel on several computers, each working on a section 
of the survey. In order to specify the region that the respec- 
tive pipeline should work on, the user must provide a list EOl 
that contains the file names of the individual tiles in question. 
If one were to fit the survey from Fig. IA2I in parallel on three 
CPU cores, one could set up the first computer with a batch 
list containing tiles 1 to 3, the second one with 4 to 6 and the 
third one with 7 to 10. As an example, the batch list for the 
first CPU core would look like this: 

/path/to/survey/tileOl .fits 
/path/to/survey/tile02 .fits 
/path/to/survey/tile03 .fits 



A5 An Example Sequence 

To summarise, the process to run GALAPAGOS on a complete 
survey requires the following steps: 

setup startup script & file list (inch SExtractor) 
run first block (B) (optionally in HDR mode) 

) optionally identify "bad" detections 

) manually create the respective "bad detection lists" 
v) if "bad detection lists" were created re-run block (B) 
vi) run block (C) to prepare & cut postage stamps 
vii) run block (D) on brightest galaxies 
viii) create batch lists for parallel processing 
ix) create startup scripts for batch lists 
x) re-run block (D) in parallel on several machines 
xi) when parallel processing is finished, run block (F) 
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#== 

AOO 
AOl 

# = = 
BOO 
BOl 
B02 
BOS 
B04 
BOB 
B06 
B07 
BOS 
B09 
BIO 
Bll 
B12 
BIS 
B14 
B15 
#det 
B16 
B17 
B18 
#det 
B19 

# = = 
COO 
COl 
C02 
COS 

# = = 
DOO 
DOl 
D02 
DOS 
D04 
D05 
D06 
D07 
DOB 
D09 
DIO 
Dll 
D12 
DIS 
D14 
D15 
D16 
D17 
DIB 
D19 
D20 

# = = 
EOO 
EOl 
E02 
EOS 
E04 
E05 
E06 
E07 
EOB 
E09 
ElO 
Ell 
E12 
EIS 
E14 
E15 

# = = 
FOO 
FOl 



= = = = = = = = = = = = = = = = = = = = = = = = = = = = FILE LOCATIONS = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 

/path/to/ survey / setup/gala_files #for an example see Fig. S 

/path/to/ survey / cat #output directory for catalogues 

= = = = = = = = = = = = = = = = = = = = = = = = = = = SEXTRACTOR SETUP = = = = = = = = = = = = = = = = = = = = = = = = = = = = 

execute #execute the SExtractor block 

/path/to/ SExtractor -binary /sex #SExtractor executable including path 
/path/to/ survey / setup/gala . param #output parameters in . param -format 
/path/to/ survey / setup/ coldsex #SExtractor setup file (cold) 

#output catalogue (cold) 
#output segmentation map (cold) 
#SExtractor setup file (hot) 
#output catalogue (hot) 
#output segmentation map (hot) 
#enlarge the cold isophotes for catalogue combination by a fac 

#output combined catalogue 



coldcat 

coldseg .fits 

/path/to/ survey/ setup /hotsex 

hot cat 

hot seg .fits 

1. 1 

out cat 



out seg .fits 
outparam 
check .fits 
apertures 



#output combined segmentation map 
#output parameter file 
#check image filename 
#check image type 



/path/to/ survey /setup/gala_exclude #path and filename of list of ''critical'' 

ctions (removed before the fitting). Set to non-existing file if not required 

1.5 #radius in pix used to exclude objects 

all #if set ' outonly ' : hot/cold catalogues / segmaps are deleted 

/path/to/ survey / setup/gala_bad #path and filename of list of ''catalogue'' 

ctions (removed after the fitting). Set to non-existing file if not required 

sexcomb #combined sextractor catalogue 

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = STAMP SETUP = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 

execute #execute the Stamps creation block 

stamps #descriptor file for postage stamps 

V #preposition for postage stamps 

2.5 #scale factor by which the sextractor isophotes are enlarged 

= = = = = = = = = = = = = = = = = = = = = = = = = SKY PREPARATION SETUP = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 

execute #execute the sky preparation block 
skymap #output ob j ect /sky-mapf ile 

#output filename for sky values 

#scale factor by which SEx isophote is enlarged (for skymap) 
#scale factor by which SEx isophote is enlarged (for neighbours) 
#additional offset to scale factor 
#distance between individual sky isophotes 
#width of individual sky isophotes 

#gap between sextractor isophote and inner sky isophote 
#cut below which objects are considered as contributing 
#nobj_max; max number of allowed contributing sources 
#power by which the flux_radius is raised to convert to Re 
#fraction of sources to be treated first (in %', 5 = 5%) 
#calculate the slope of the sky from the x last determinations 
#slope in fwhm_image vs. mag_best below which object is star 
#zeropoint in fwhm_image vs. mag_best below which object is star 
#magnitude faint end limit for secondaries when fitting galaxies 
#magnitude faint end limit for secondaries when fitting stars 
#number of neighbouring frames 
#maximum number of parallel processes 

#minimum distance between sources (in arcseconds) for D19) 
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = GALFIT SETUP = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 

/path/to/galf it -binary /galf it #Galfit executable including path 

/path/to/ survey / setup/batchlist . XX #tile batch list 

obj #object file preposition 

v_gf #preposition for GALFIT output files 

/path/to/ survey / setup/psf . fits #PSF filename including path 

mask #mask file preposition 

constr #constraint file preposition 

257 #convolution box size 

26.486 #zeropoint 

0.03 #plate scale of the images [arcsec /pixel ] 

706 #exposure time 

750 #constraint max Re 

-5 #constraint min magnitude deviation (minus) 

5 #constraint max magnitude deviation (plus) 

notnice #nice? 

2.1c #GALFIT version string. E.g. 2.0.3c 

= = = = = = = = = = = = = = = = = = = = = = = = OUTPUT CATALOGUE SETUP = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 

execute #execute catalogue combination block 
combcat.fits #filename for output catalogue in AOl) 
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Figure Al. Example of a startup script for GALAPAGOS. 
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/path/to/survey/tileOl . f its /path/to/survey/whtOl . f its /path/to/survey/tOl tOl. 
/path/to/survey/tile02 . f its /path/to/survey/wht02 . f its /path/to/survey/t02 t02 . 
/path/to/survey/tile03 . f its /path/to/survey/wht03 . f its /path/to/survey/t03 t03. 

/path/to/survey/tilelO . f its /path/to/survey/whtlO . f its /path/to/survey/tlO tlO. 

Figure A2. Example of a file location list for GALAPAGOS. The middle section is left out. In this example a survey with 10 tiles is defined. 
The four columns represent science image, corresponding weight image (exposure time map), output directory and output file preposition, 
respectively. 

Note that if the survey is small enough, steps 6 to 8 may be 
combined, either by setting the brightest galaxies fraction D12 
to 100%, thus taking full advantage of the available CPU cores 
on the machine or by simply providing only one batch file con- 
taining all tiles. The latter option does not provide advantages 
over the first one and is best used only for testing purposes. 

APPENDIX B: STARTING PARAMETERS 

In this appendix we give a detailed description of the start- 
ing parameters in the Galapagos startup file. Lines starting 
with "#" are treated as comments and are ignored by the code. 
Examples for SExtractor related setup files (items B02, BOS, 
B06 in the t able below) can be found in the corresponding docu- 
mentation (JBertin fc ArnoutsI 19961 ). In order not to execute the 
blocks BOO, COO, DOO or FOO the user should either replace "exe- 
cute" with something else or simply comment out the respective 
line, e.g. "#B00) execute" . Files in the table below without a di- 
rectory descriptor can be found in the output directory defined 
for each survey image in the file list unless otherwise noted. 



© 2012 RAS, MNRAS OOG.IlHSTI 



File Locations 



AOO) /path/to/survey /setup/gala_files 

AOl) /path/to/survey/cat 



setup of the survey tiling (path and filename). For an example 
see Fig. \M\ 

output directory for catalogues. In this directory, the combined 
SExTRACTOR catalogue (item B19); in ASCII format) and the 
final output catalogue (item FOl); in FITS format) are placed 



SEXTRACTOR Setup 



00) execute 




01) /path/to/SExtractor-binary/sex 


02) /path/to/survey/setup/gala.param 


03) /path/to/survey /setup/coldsex 


04) coldcat 




05) coldseg.fits 




06) /path/to/survey/sctup/hotsex 


07) hotcat 




08) hotseg.fits 




09) 1.1 




10) outcat 




11) outseg.fits 




12) outparam 




13) check. fits 




14) apertures 




15) /path/to/survey /setup/gala_exclude 


16) 1.5 




17) all 




18) /path/to/survey /setup/gala_bad 


19) sexcomb 





execute the S Extractor block 

path and filename of SExtractor executable 

path and filename of SExtractor output parameters in 

.param-format 

path and filename of SExtractor setup file (cold) 

filename of SExtractor output catalogue (cold) 

filename of SExtractor output segmentation map (cold) 

path and filename of SExtractor setup file (hot) 

filename of SExtractor output catalogue (hot) 

filename of SExtractor output segmentation map (hot) 

factor by which the cold isophotes are enlarged when combining 

hot/cold catalogues 

filename of combined SExtractor output catalogue 

filename of combined SExtractor output segmentation map 

filename of SExtractor output parameter file 

filename of SExtractor check image 

type of SExtractor check image 

path and filename of list of "critical" detections (removed before 

the fitting). Set to non-existing file if not required 

radius in pix used to exclude "bad" detections 

if set "outonly" : hot/cold catalogues/segmaps are deleted, else 

all files are kept 

path and filename of list of "catalogue" detections (removed 

after the fitting) . Set to non-existing file if not required 

filename of combined SExtractor catalogue. Output directory 

is AOl) 



Stamp Setup 



COO) 
COl) 



C02) 



C03) 



execute 
stamps 



2.5 



execute the postage stamps creation block 

output descriptor file for postage stamps. Per line, this ASCII 
file contains: SExtractor number, x/y source centre, x-range, 
y-range 

filename preposition for postage stamps. E.g. for C02) = "v", 
a global file preposition "iml." (from file list) and SExtrac- 
tor detection number "234", the output filename would be: 
"iml.v234.fits" 

scale factor by which the SExtractor isophotes (Kron ellipses) 
are enlarged to calculate postage stamp size 



Sky Preparation 



execute 


skymap 


outsky 


3 


1.5 


20 


30 


60 


30 



execute the sky preparation block 

filename of output object/sky-mapfile 

filename of output list with sky values 

scale factor by which SExtractor isophote is enlarged (for 

calculating the skymap) 

scale factor by which SExtractor isophote is enlarged (for 

neighbouring source treatment) 

Definition of sky isophotes: additional offset to scale factor (in 

pix), for sky measurement 

Definition of sky isophotes: distance between individual sky 

isophotes 

Definition of sky isophotes: width of individual sky isophotes 

Definition of sky isophotes: gap between SExtractor isophote 

and inner sky isophote 
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D09) 

DIO) 
Dll) 

D12) 



D17) 

D18) 
D19) 

D20) 



2 
1.4 



13) 


15 


14) 


-0.3 


15) 


6.8 


16) 


5 



8 
4 

360 



cut below which objects are considered as contributing to the 
actual primary source 

max number of allowed contributing sources per primary source 
power by which the flux_radius is raised to be converted to a 
half-light radius 

fraction of sources to be treated first (in %; "5" =5%), using 
multiple CPUs 

calculate the slope of the sky from the x last determinations 
slope in FWHMJMAGE vs. MAG_BEST below which an object 
is considered a star. Used for treating secondary sources 
zeropoint in FWHMJMAGE vs. MAG_BEST below which an 
object is considered a star. Used for treating secondary sources 
magnitude faint end limit for secondaries when fitting galax- 
ies. Objects more than x mag fainter than the primary are not 
included as secondary sources but tertiaries 
magnitude faint end limit for secondaries when fitting stars. See 
also D16) 

number of neighbouring tiles. See Sec. 13.61 

maximum number of parallel processes for fitting the brightest 
sources, defined by D12) 

minimum distance (in arcseconds) between all simultaneously 
fitted sources for D19). If current source in fitting queue is closer, 
fitting is delayed until other sources are done and the criterium 
is fulfilled 



Galfit Setup 



EOO) /path/to/galfit-binary/galfit 

EOl) /path/to/survey/setup/batchlist.XX 

E02) obj 



E03) 



E04) 
E05) 



E06) 



v_gf 



/path/to/survey/setup/psf.fits 
mask 



constr 



257 


26.486 


0.03 


706 


750 


-5 


5 


notnice 


2.1c 



path and filename of Galfit executable 

path and filename of a batch list, used for parallel processing of 
several tiles. For an example see Sec. IA4I 

object file preposition. E.g. for E02) — "obj", a global file prepo- 
sition "iml." (from file list) and SExtractor detection number 
"234", the output filename would be: "iml.obj234" 
preposition for Galfit output files. E.g. for E03) = "v_gf", 
a global file preposition "iml." (from file list) and SExtrac- 
tor detection number "234", the output filename would be: 
"iml.v_gf234.fits". Note, Galfit output files contain 3 FITS ex- 
tensions: the original image, the model (including fit parameters 
in the header) and the residual image 
PSF filename including path 

mask file preposition used in Galfit. E.g. for E05) — "mask", 
a global file preposition "iml." (from file list) and SExtrac- 
tor detection number "234", the output filename would be: 
"iml.mask234.fits". 

constraint file preposition. E.g. for E06) = "constr", a global file 
preposition "iml." (from file list) and SExtractor detection 
number "234", the output filename would be: "iml.constr234". 
size of PSF convolution box 
magnitude zeropoint 
plate scale of the images [arcsec/pixel] 

effective exposure time (after image reduction, multidrizzling, 
etc.) 

constraint: maximum allowed half-light radius 
constraint: minimum magnitude deviation (minus) from SEx- 
tractor measurement, i.e. the fit magnitude is constrained to 
not more than E12) mag brighter than the SExtractor value 
constraint: maximum magnitude deviation (plus) from SEx- 
tractor measurement. See also E12) 

use the UNIX facility "nice" when starting the fitting with Gal- 
fit. Set E14) to "nice" to activate "nicing" 
Galfit version string. E.g. 2.0.3c 
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Output Catalogue Setup 



FOO) execute execute catalogue combination block 

FOl) combcat.fits filename of combined FITS output catalogue. Output directory 

is AOl) 
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